Vector representations and vector space modeling (VSM) play a central role inmodern machine learning. We propose a novel approach to `vector similaritysearching' over dense semantic representations of words and documents that canbe deployed on top of traditional inverted-index-based fulltext engines, takingadvantage of their robustness, stability, scalability and ubiquity. We show that this approach allows the indexing and querying of dense vectorsin text domains. This opens up exciting avenues for major efficiency gains,along with simpler deployment, scaling and monitoring. The end result is a fast and scalable vector database with a tunabletrade-off between vector search performance and quality, backed by a standardfulltext engine such as Elasticsearch. We empirically demonstrate its querying performance and quality by applyingthis solution to the task of semantic searching over a dense vectorrepresentation of the entire English Wikipedia.
展开▼